tests_gaudi: Added L2 vllm workload #329

vbedida79 · 2024-10-31T16:09:08Z

PR includes gaudi l2 vllm workload

buildconfig docker image is based on PR Add in Dockerfile.hpu.ubi HabanaAI/vllm-fork#602
deployment including secret for hugging face token and pvc for model cache
Readme steps to build, deploy and verify

Signed-off-by: vbedida79 [email protected]

uMartinXu · 2024-11-01T17:10:03Z

tests/gaudi/l2/README.md

@@ -74,4 +74,83 @@ Welcome to HCCL demo
 [BENCHMARK]     NW Bandwidth   : 258.209121 GB/s
 [BENCHMARK]     Algo Bandwidth : 147.548069 GB/s
 ####################################################################################################
+```
+
+## VLLM 


uMartinXu · 2024-11-01T17:10:13Z

tests/gaudi/l2/README.md

+```
+
+## VLLM 
+VLLM is a serving engine for LLM's. The following workloads deploys a VLLM server with an LLM using Intel Gaudi. Refer to [Intel Gaudi VLLM fork](https://github.com/HabanaAI/vllm-fork.git) for more details.


uMartinXu · 2024-11-01T17:11:18Z

tests/gaudi/l2/README.md

+Build the workload container image:
+```
+$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/vllm_buildconfig.yaml
+```


Could we add the instruction to let user know whether the building is success. :-)

uMartinXu · 2024-11-01T17:12:47Z

tests/gaudi/l2/README.md

+```
+Deploy the workload:
+* Update the hugging face token and the pvc according to your cluster setup
+```


Could we have some detail about setting the hugging face token? and also give some brief introduction about what model we are using. :-)

uMartinXu · 2024-11-07T00:02:37Z

tests/gaudi/l2/vllm_buildconfig.yaml

+  runPolicy: "Serial"
+  source:
+    git:
+      uri: https://github.com/opendatahub-io/vllm.git


After comparing

https://github.com/opendatahub-io/vllm.git - ODH fork vllM

https://github.com/vllm-project/vllm - vLLM upstream
3.https://github.com/HabanaAI/vllm-fork - Habana fork vLLM
I think currently we should start from use the 3. with the change in 1 (adding the ubi based docker file for RH OpenShift), and obviously the Intel are upstreaming from 3 to 2. So in the long run we will using 2.
So I think we need to 1). submit a PR to adding the ubi based docker file for RH, and also add the RH 9.4 support into the documents, and then 2). using repo 3 3) I think the owner of 3 will also help to upstream the ubi based docker file and doc to 2. 4) after that we can switch to use 2 the upstream vLLM.
@vbedida79 any comments? :-)

HabanaAI/vllm-fork#190 PR for ubi from RH when merged into vllm gaudi fork repo, we can use that directly. Currently we can use the ubi image by RH maintained in https://github.com/opendatahub-io/vllm.git from https://github.com/HabanaAI/vllm-fork

vbedida79 · 2024-12-09T17:35:45Z

Updated according to comments, please review. thanks

uMartinXu · 2024-12-16T18:49:55Z

tests/gaudi/l2/README.md

@@ -75,3 +75,104 @@ Welcome to HCCL demo
 [BENCHMARK]     Algo Bandwidth : 147.548069 GB/s
 ####################################################################################################
 ```
+<<<<<<< HEAD


PR git comments: the buildconfig base on HabanaAI/vllm-fork#602

sure, updated in PR and git commit

uMartinXu · 2024-12-16T18:53:22Z

tests/gaudi/l2/README.md

+Build the workload container image:
+```
+git clone https://github.com/opendatahub-io/vllm.git --branch gaudi-main
+


should use 1.18.0 branch ?

updated to v1.18.0 from https://github.com/HabanaAI/vllm-fork/tree/v1.18.0 repo

uMartinXu · 2024-12-16T19:04:24Z

tests/gaudi/l2/vllm_deployment.yaml

+          - containerPort: 8000
+          resources:
+            limits:
+              habana.ai/gaudi: 4


Could we check and confirm how many Accelerators are actually used by vLLM?
I suggest to start from using only single Accelerator.

I can check for 1 resource and update

vllm gaudi ubi image based on PR HabanaAI/vllm-fork#602 Signed-off-by: vbedida79 <[email protected]>

vbedida79 requested a review from uMartinXu October 31, 2024 17:01

uMartinXu reviewed Nov 6, 2024

View reviewed changes

uMartinXu reviewed Nov 7, 2024

View reviewed changes

vbedida79 force-pushed the patch-301024-1 branch from c7d75b9 to 07ec92f Compare December 9, 2024 17:35

vbedida79 force-pushed the patch-301024-1 branch 3 times, most recently from 46ef40e to 462c42d Compare December 16, 2024 18:31

uMartinXu reviewed Dec 16, 2024

View reviewed changes

vbedida79 force-pushed the patch-301024-1 branch 2 times, most recently from 5f35eab to 4e68146 Compare December 16, 2024 22:09

tests_gaudi: Added L2 vllm workload

dd2a16c

vllm gaudi ubi image based on PR HabanaAI/vllm-fork#602 Signed-off-by: vbedida79 <[email protected]>

vbedida79 force-pushed the patch-301024-1 branch from 4e68146 to dd2a16c Compare December 16, 2024 22:23

uMartinXu approved these changes Dec 16, 2024

View reviewed changes

uMartinXu merged commit 8d5c358 into intel:main Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests_gaudi: Added L2 vllm workload #329

tests_gaudi: Added L2 vllm workload #329

vbedida79 commented Oct 31, 2024 •

edited

Loading

uMartinXu Nov 1, 2024

uMartinXu Nov 1, 2024

uMartinXu Nov 1, 2024

uMartinXu Nov 1, 2024

uMartinXu Nov 7, 2024

vbedida79 Dec 9, 2024 •

edited

Loading

vbedida79 commented Dec 9, 2024

uMartinXu Dec 16, 2024

vbedida79 Dec 16, 2024 •

edited

Loading

uMartinXu Dec 16, 2024

vbedida79 Dec 16, 2024 •

edited

Loading

uMartinXu Dec 16, 2024

vbedida79 Dec 16, 2024

tests_gaudi: Added L2 vllm workload #329

tests_gaudi: Added L2 vllm workload #329

Conversation

vbedida79 commented Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbedida79 Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

vbedida79 commented Dec 9, 2024

Choose a reason for hiding this comment

vbedida79 Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbedida79 Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbedida79 commented Oct 31, 2024 •

edited

Loading

vbedida79 Dec 9, 2024 •

edited

Loading

vbedida79 Dec 16, 2024 •

edited

Loading

vbedida79 Dec 16, 2024 •

edited

Loading